CV4Edu 2026

CV4Edu - Computer Vision for Education

Computer vision (CV) plays a central role in multimodal human-centered AI, yet most models are trained on web-scale benchmarks that poorly reflect real classrooms. Educational data are noisy, private, small-scale, and multimodal (e.g., video, audio, text). Students’ cognitive/behavioral states (e.g., engagement, mind-wandering) and learning processes (e.g., self-regulation, collaboration) can be inferred from subtle multimodal cues (e.g., gaze, pose, facial features). Still, today’s models struggle to generalize to classroom data, limiting reliability in deployed human-centered applications (e.g., assistive technology, collaborative AI). CV4Edu brings together computer vision, natural language processing, human-computer interaction, and educational researchers to chart a community agenda for efficient, privacy-aware multimodal data-driven models that are more reliable in low-resource, real-world classroom settings — potentially launching shared datasets, metrics, and unified practices.

Our goal is to support research that bridges CV, NLP, HCI, cognitive science, and the learning sciences/education communities. We welcome submissions both within and beyond education contexts—such as multimodal modeling, sensing, behavior forecasting, cognitive state inference, robotics, and embodied AI—provided they discuss transferability to classroom settings (e.g., what may break or carry over under noise, occlusions, viewpoints, multi-person dynamics, privacy constraints, limited annotations, distribution shift, hardware variability).

Topics

The workshop topics include (but are not limited to):

Multimodal classroom perception

Face, gaze, pose, gesture, posture, affect, and prosody
Video, audio, gaze sensors, and wearables (egocentric and exocentric)
Multimodal fusion, representation learning, and cross-view / multi-camera setups

Language-centered multimodal learning analytics

Linking speech/text to video events, gaze/attention, and instructional context
Classroom NLP: ASR robustness, diarization, evaluating and mitigating bias, discourse modeling, dialogue/tutoring interactions, simplification, misconception detection
Retrieval-augmented classroom analytics, model adaptation, evaluation for learning-aligned outcomes

Robustness & generalization

Domain shift beyond the lab, occlusions, noisy data, and missing modalities
Few-/low-shot learning, continual and on-device adaptation
Generalization across classroom layouts and populations

Human behavior modeling for learning

Engagement, attention, affect, confusion, self-regulation, and metacognition
Collaboration, group dynamics, and teacher–student interactions
Gaze-informed models, saliency/scanpath prediction, activity recognition

Temporal modeling & intervention

Sequential/temporal models of learning processes
Behavioral forecasting, early-warning systems, and interventions
Real-time inference, feedback, and human-in-the-loop systems

Interpretability, reliability & evaluation

Interpretable models, uncertainty estimation, and calibration
OOD detection, fairness, and bias analysis
Evaluation protocols aligned with learning outcomes

Privacy-aware AI, datasets & deployments

Privacy-preserving data collection, anonymization, de-identification, and governance
Annotation strategies, construct-aligned labeling, active learning, synthetic data, and dataset curation
Classroom-ready systems, scalable multimodal data-collection frameworks, edge/on-device inference, and real-world deployments

We encourage general computer-vision, visually grounded NLP, and human-centered, collaborative AI submissions (e.g., behavioral modeling, pose/activity recognition, gaze estimation, attention modeling, multimodal learning, methods “in the wild”, cognitive state inference and forecasting) that make a clear connection to educational/learning environments (even if primarily in the discussion).

Accepted Papers

Archival Papers

1. Mahsa Ardakani, Arshia Eslami, Ramtin Zand

2. Ahmed Abdelkawy, Ahmed Elsayed, Asem Ali, Aly Farag, Thomas Tretter, Michael McIntyre

3. Ziwei Zhao, Xizi Wang, Yuchen Wang, Feng Cheng, David J. Crandall

4. Wen-Hsin Tsai, Chia-Ming Lee, Yuk-Ying Tung

5. Ethan Seefried, Changsoo Jung, Videep Venkatesha, Trevor Chartier, Caleb Christian, Jack Fitzgerald, Mariah Bradford, Sifatul Anindho, Matthew Sturgeon, Nathaniel Blanchard

6. Martyna Gruszka, Risa Shinoda, Taiki Miyanishi, Takumi Hirose, Nakamasa Inoue

7. Muhammad Rafsan Kabir, Md Shopon, Marina Gavrilova

8. Xiao Wang, Lu Dong, Ifeoma Nwogu, Srirangaraj Setlur, Venu Govindaraju

9. Chongyu He, Peter Youngs, Scott Acton

10. Hanchen David Wang, Yilin Liu, Madison Mason, Surya Rayala, Gautam Biswas, Daniel Levin, Meiyi Ma

11. Lu Dong, Xiao Wang, Mark Frank, Srirangaraj Setlur, Venu Govindaraju, Ifeoma Nwogu

12. Sifatul Anindho, Videep Venkatesha, Nathaniel Blanchard

13. Yuji Zhang, Duo Zhou, Bo Chen, Adi Chalasani, Noah Schroeder, H Chad Lane, ChengXiang Zhai

14. Ekta Sood, Sebastian Ricke, Trisha Mittal, Sidney K. DMello

15. Ashwin T S, Srigowri Mayasandra Prasanna, Joyce Horn Fonteles, Gautam Biswas

16. Divya Mereddy, Ashwin T S, Marcos Quinones Grueiro, Gautam Biswas

17. Suraj Prasad, Pinak Mahapatra

Non-Archival Papers

1. Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou

2. Zeyu Zhu, Kevin Qinghong Lin, Mike Zheng Shou

3. Pinak Mahapatra, Suraj Prasad

4. Twumasi Mensah-Boateng, Anirban Roy, Marta K. Mielicki, Nonye M Alozie, Ramneet Kaur, Jing Yuan

5. Aayam Bansal

6. Aman Goyal, Kshama Nitin Shah

7. Aryan Kashyap Naveen, Bhuvanesh Singla, Raajan R Wankhade, Shreesha M, Ramu S, Ram Mohana Reddy Guddeti

8. Yanhang Li, Zhichao Fan, Zexin Zhuang

9. Yanhang Li, Zhichao Fan, Zexin Zhuang

10. Siddharth Manne, Shaden Alshammari, Satish Somaraju

11. Videep Venkatesha, Ethan Seefried, Changsoo Jung, Nathaniel Blanchard

12. Thai Quoc Hoang, Ran Xu

Invited Speakers

Panelists

Dr. Gautam Biswas

Vanderbilt University

Dr. Nikhil Krishnaswamy

Colorado State University

Dr. Nathaniel Blanchard

Colorado State University

Dr. David Chen

University of California, Berkeley

Joyce Horn Fonteles

Vanderbilt University

Mariah Bradford

Colorado State University

Workshop Schedule - June 4 - Room 113

1:00PM	Opening and Goals
1:10PM	Keynotes 1 and 2
2:15PM	Poster Session @ Hall A
3:00PM	Coffee Break
3:15PM	Poster Session @ Hall A (cont.)
4:00PM	Keynotes 3 and 4
5:00PM	Panel and Community Discussion
5:55PM	Closing and Next Steps

Venue

Denver Convention Center
700 14th Street
Denver CO 80202

The workshop will be held together with CVPR 2026.

Workshop Organizers

For any questions about the workshop, please contact cv4edu.cvpr@gmail.com

Ekta Sood

University of Colorado Boulder

Joyce Horn Fonteles

Vanderbilt University

Mariah Bradford

Colorado State University

Paul Gavrikov

Independent researcher

Prajit Dhar

University of Marburg

Janis Pagel

University of Cologne

Trisha Mital

Dolby Laboratories

Gautam Biswas

Vanderbilt University

Sidney D'Mello

University of Colorado Boulder

List of Sponsors

CV4Edu 2026 is made possible by these organizations.

Computer Vision × Education: Building a Cross-Community Agenda for Multimodal Vision in Classrooms

CV4Edu - CVPR 2026 Workshop

June 4, 2026

CV4Edu - Computer Vision for Education

Topics

Multimodal classroom perception

Language-centered multimodal learning analytics

Robustness & generalization

Human behavior modeling for learning

Temporal modeling & intervention

Interpretability, reliability & evaluation

Privacy-aware AI, datasets & deployments

Accepted Papers

Archival Papers

Non-Archival Papers

Invited Speakers

Panelists

Workshop Schedule - June 4 - Room 113

Opening and Goals

Keynotes 1 and 2

Poster Session @ Hall A

Coffee Break

Poster Session @ Hall A (cont.)

Keynotes 3 and 4

Panel and Community Discussion

Closing and Next Steps

Venue

Workshop Organizers

List of Sponsors

Computer Vision × Education: Building a Cross-Community Agenda for Multimodal Vision in Classrooms

CV4Edu - CVPR 2026 Workshop

June 4, 2026

CV4Edu - Computer Vision for Education

Topics

Multimodal classroom perception

Language-centered multimodal learning analytics

Robustness & generalization

Human behavior modeling for learning

Temporal modeling & intervention

Interpretability, reliability & evaluation

Privacy-aware AI, datasets & deployments

Accepted Papers

Archival Papers

Non-Archival Papers

Invited Speakers

Panelists

Workshop Schedule - June 4 - Room 113

Opening and Goals

Keynotes 1 and 2

Poster Session @ Hall A

Coffee Break

Poster Session @ Hall A (cont.)

Keynotes 3 and 4

Panel and Community Discussion

Closing and Next Steps

Venue

Workshop Organizers

List of Sponsors

Dr. Mohit Bansal Keynote Speaker, UNC Chapel Hill

Understanding Student Engagement & Teacher Facilitation in the Classroom via Multimodal Video Reasoning

Dr. Scott Acton Keynote Speaker, University of Virginia

Advancing Instruction through Computer Vision

Dr. Marcelo Worsley Keynote Speaker, Northwestern University

Title TBA

Dr. Jacob Whitehill Keynote Speaker, Worcester Polytechnic Institute

Computer Vision for Classroom Observation - Privacy, Bias, and Multimodality

CSU101: An Educational Dataset for Introductory Computer Vision

Ethan Seefried, Changsoo Jung, Videep Venkatesha, Trevor Chartier, Caleb Christian, Jack Fitzgerald, Mariah Bradford, Sifatul Anindho, Matthew Sturgeon, Nathaniel Blanchard

MES-Bench: A Benchmark for Multimodal Elaborative Simplification and Comprehensibility Evaluation in Language Learning

Martyna Gruszka, Risa Shinoda, Taiki Miyanishi, Takumi Hirose, Nakamasa Inoue

ReSoFed: Reliability-Guided Model Souping for Robust Federated Learning in Heterogeneous Classroom Environments

Muhammad Rafsan Kabir, Md Shopon, Marina Gavrilova

InterventionLens: A Multi-Agent Framework for Detecting ASD Intervention Strategies in Parent-Child Shared Reading

Xiao Wang, Lu Dong, Ifeoma Nwogu, Srirangaraj Setlur, Venu Govindaraju

Delta-Gated Incremental Multi-Forward-Pass Modeling for Robust Multimodal Classroom Video Understanding

Chongyu He, Peter Youngs, Scott Acton

AI-Assisted Competency Assessment from Egocentric Video in Simulation-Based Nursing Education

Hanchen David Wang, Yilin Liu, Madison Mason, Surya Rayala, Gautam Biswas, Daniel Levin, Meiyi Ma

ConfusionBench: An Expert-Validated Benchmark for Confusion Recognition and Localization in Educational Videos

Lu Dong, Xiao Wang, Mark Frank, Srirangaraj Setlur, Venu Govindaraju, Ifeoma Nwogu

Evaluating Web-trained Facial Expression Recognition in Naturalistic Collaborative Learning

Sifatul Anindho, Videep Venkatesha, Nathaniel Blanchard

Scaffolding Human Learning by Shaping Visual Environment

Yuji Zhang, Duo Zhou, Bo Chen, Adi Chalasani, Noah Schroeder, H Chad Lane, ChengXiang Zhai

From Emotion Recognition to Mind-Wandering Detection: A Comparative Analysis of Video-Based Emotion Foundation Models

Ekta Sood, Sebastian Ricke, Trisha Mittal, Sidney K. DMello

Do Emotion Recognition Models Generalize to Classrooms? Robustness and Fairness Analysis

Ashwin T S, Srigowri Mayasandra Prasanna, Joyce Horn Fonteles, Gautam Biswas

Diagnosis of Human–Object Interaction Detectors for Real-World Educational Applications

Divya Mereddy, Ashwin T S, Marcos Quinones Grueiro, Gautam Biswas

Speech-Synchronized Whiteboard Generation via VLM-Driven Structured Drawing Representations

Suraj Prasad, Pinak Mahapatra

VLMath: A Multimodal Vision-Language System for Pedagogically Aligned Math Tutoring

Mahsa Ardakani, Arshia Eslami, Ramtin Zand

Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification

Ahmed Abdelkawy, Ahmed Elsayed, Asem Ali, Aly Farag, Thomas Tretter, Michael McIntyre

Sequence-Based Identification of First-Person Camera Wearers in Third-Person Views

Ziwei Zhao, Xizi Wang, Yuchen Wang, Feng Cheng, David J. Crandall

Cross-modal Affinity-aligned Multimodal Learning Analytics for Predicting Student Collaboration Satisfaction in Game-Based Learning

Wen-Hsin Tsai, Chia-Ming Lee, Yuk-Ying Tung

Zero-Shot Vision-Language Models for Classroom Engagement Recognition: A Benchmark Study of Prompt Sensitivity and Cross-Dataset Generalization

Aman Goyal, Kshama Nitin Shah

AutoOEP: A Multi-Cue Framework for Online Exam Proctoring

Aryan Kashyap Naveen, Bhuvanesh Singla, Raajan R Wankhade, Shreesha M, Ramu S, Ram Mohana Reddy Guddeti

How Much Do We Lose? Quantifying the Impact of Face De-identification on Classroom Computer Vision Tasks

Yanhang Li, Zhichao Fan, Zexin Zhuang

Do We Need Faces? Privacy-Preserving Engagement Detection via Face-Free Features in Classroom Video

Yanhang Li, Zhichao Fan, Zexin Zhuang

Socratic: Pushing the Boundaries of Interactive Visual Educational Content Creation.